Weighting Features
نویسندگان
چکیده
Many case-based reasoning algorithms employ derivatives of the k-nearest neighbor (k-NN) classiier for case retrieval. Several studies have shown that its similarity function is sensitive to imperfect feature sets (i.e., containing irrelevant, redundant, interacting, or noisy features). Many proposed methods attempt to reduce this sensitivity by parameterizing k-NN's similarity function with feature weights. We focus on the subset of these methods that automatically assign weight settings using little or no domain-speciic knowledge. Our goal is to understand the relative capabilities of these methods for given sets of conditions. We rst describe a ve-dimensional framework that categorizes automated weight-setting methods. Next, we empirically compare methods along one of these dimensions. Finally, we summarize our results with four hypotheses and describe additional empirical evidence to support them. Our investigation revealed that most methods correctly assign low weights to completely irrelevant features, and that methods which use performance feedback to assign weight settings demonstrate three advantages over other methods (i.e., they require less pre-processing, perform better in the presence of interacting attributes, and support faster learning rates).
منابع مشابه
Image Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix
In this article, a fabulous method for database retrieval is proposed. The multi-resolution modified wavelet transform for each of image is computed and the standard deviation and average are utilized as the textural features. Then, the proposed modified bit-based color histogram and edge detectors were utilized to define the high level features. A feedback-based dynamic weighting of shap...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملWeighting Unusual Feature Types
Feature weighting is known empirically to improve classification accuracy for k-nearest neighbor classifiers in tasks with irrelevant features. Many feature weighting algorithms are designed to work with symbolic features, or numeric features, or both, but cannot be applied to problems with features that do not fit these categories. This paper presents a new k-nearest neighbor feature weighting...
متن کاملAn Improved Approach to Term Weighting in Hierarchical Web Page Classification
Currently, in web page classification, Absolute Weighting Method is a common method to weight HTML main structure features. The disadvantage of the method is that weighting coefficient is a fixed value, which has different effects on the long and short text. So the influence of structure features on local text will be weakened with the length of local text increasing. To solve the problem, we p...
متن کاملFeature Weighting for Segmentation
This paper proposes the use of feature weights to reveal the hierarchical nature of music audio. Feature weighting has been exploited in machine learning, but has not been applied to music audio segmentation. We describe both a global and a local approach to automatic feature weighting. The global approach assigns a single weighting to all features in a song. The local approach uses the local s...
متن کاملThe Open University ’ s repository of research publications and other research outputs Feature weighting methods for abstract features applicable to motion based video indexing
Content based labels, associated with image sequences in contemporary video indexing methods, can be textual, numerical as well as abstract, including colourhistograms and motion co-occurrence matrices. Abstract features or indices are not explicitly numeric entities but rather are composed of numeric entities. When multiple abstract features are involved, distance metrics between image sequenc...
متن کامل